<!DOCTYPE html>
<html >

<head>

  <meta charset="UTF-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <title>《属性数据分析》代码</title>
  <meta name="description" content="《属性数据分析》代码。">
  <meta name="generator" content="bookdown 0.7 and GitBook 2.6.7">

  <meta property="og:title" content="《属性数据分析》代码" />
  <meta property="og:type" content="book" />
  
  
  <meta property="og:description" content="《属性数据分析》代码。" />
  

  <meta name="twitter:card" content="summary" />
  <meta name="twitter:title" content="《属性数据分析》代码" />
  
  <meta name="twitter:description" content="《属性数据分析》代码。" />
  



<meta name="date" content="2018-09-05">

  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="black">
  
  
<link rel="prev" href="logistic-regression.html">
<link rel="next" href="multi-logit-model.html">
<script src="libs/jquery/jquery.min.js"></script>
<link href="libs/gitbook/css/style.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-fontsettings.css" rel="stylesheet" />









<style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
  { position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
  { content: attr(data-line-number);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; pointer-events: all; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>

<link rel="stylesheet" href="css/style.css" type="text/css" />
</head>

<body>



  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">

    <div class="book-summary">
      <nav role="navigation">

<ul class="summary">
<li><a href="./index.html">《属性数据分析》代码</a></li>

<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>前言</a></li>
<li class="chapter" data-level="1" data-path="intro.html"><a href="intro.html"><i class="fa fa-check"></i><b>1</b> 导言</a><ul>
<li class="chapter" data-level="1.1" data-path="intro.html"><a href="intro.html#data-intro"><i class="fa fa-check"></i><b>1.1</b> 属性响应数据</a></li>
<li class="chapter" data-level="1.2" data-path="intro.html"><a href="intro.html#prob-dist"><i class="fa fa-check"></i><b>1.2</b> 属性数据的概率分布</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布计算"><i class="fa fa-check"></i>二项分布计算</a></li>
</ul></li>
<li class="chapter" data-level="1.3" data-path="intro.html"><a href="intro.html#stat-infer"><i class="fa fa-check"></i><b>1.3</b> 比例的统计推断</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布似然函数图"><i class="fa fa-check"></i>二项分布似然函数图</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布假设检验"><i class="fa fa-check"></i>二项分布假设检验</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布置信区间"><i class="fa fa-check"></i>二项分布置信区间</a></li>
</ul></li>
<li class="chapter" data-level="1.4" data-path="intro.html"><a href="intro.html#more-stat-infer"><i class="fa fa-check"></i><b>1.4</b> 关于离散数据的更多统计推断</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布参数统计推断"><i class="fa fa-check"></i>二项分布参数统计推断</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#小样本推断"><i class="fa fa-check"></i>小样本推断</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#小样本推断p值调整"><i class="fa fa-check"></i>小样本推断P值调整</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#problems-ch1"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#第4题"><i class="fa fa-check"></i>第4题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="2" data-path="contingency-table.html"><a href="contingency-table.html"><i class="fa fa-check"></i><b>2</b> 列联表</a><ul>
<li class="chapter" data-level="2.1" data-path="contingency-table.html"><a href="contingency-table.html#stucture"><i class="fa fa-check"></i><b>2.1</b> 列联表的概率结构</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#关于来世"><i class="fa fa-check"></i>关于来世</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="contingency-table.html"><a href="contingency-table.html#prop-compare"><i class="fa fa-check"></i><b>2.2</b> 2×2表比例的比较</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#阿司匹林与心脏病列联表检验"><i class="fa fa-check"></i>阿司匹林与心脏病（列联表检验）</a></li>
</ul></li>
<li class="chapter" data-level="2.3" data-path="contingency-table.html"><a href="contingency-table.html#odds-ratio"><i class="fa fa-check"></i><b>2.3</b> 优势比</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#阿司匹林与心脏病优势比"><i class="fa fa-check"></i>阿司匹林与心脏病（优势比）</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#吸烟状态与心肌梗死"><i class="fa fa-check"></i>吸烟状态与心肌梗死</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="contingency-table.html"><a href="contingency-table.html#chi-square-test"><i class="fa fa-check"></i><b>2.4</b> 独立性的卡方检验</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#性别和党派认同"><i class="fa fa-check"></i>性别和党派认同</a></li>
</ul></li>
<li class="chapter" data-level="2.5" data-path="contingency-table.html"><a href="contingency-table.html#indenpendence-test-for-ordinal-data"><i class="fa fa-check"></i><b>2.5</b> 有序数据的独立性检验</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#饮酒与婴儿畸形"><i class="fa fa-check"></i>饮酒与婴儿畸形</a></li>
</ul></li>
<li class="chapter" data-level="2.6" data-path="contingency-table.html"><a href="contingency-table.html#exact-test-for-small-sample"><i class="fa fa-check"></i><b>2.6</b> 小样本的精确推断</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#女士品茶"><i class="fa fa-check"></i>女士品茶</a></li>
</ul></li>
<li class="chapter" data-level="2.7" data-path="contingency-table.html"><a href="contingency-table.html#three-way-table"><i class="fa fa-check"></i><b>2.7</b> 三项列联表的关联性</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#死刑判决案例"><i class="fa fa-check"></i>死刑判决案例</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#临床试验"><i class="fa fa-check"></i>临床试验</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#problems-ch2"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第18题"><i class="fa fa-check"></i>第18题</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第22题"><i class="fa fa-check"></i>第22题</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第33题"><i class="fa fa-check"></i>第33题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="3" data-path="glm.html"><a href="glm.html"><i class="fa fa-check"></i><b>3</b> 广义线性模型</a><ul>
<li class="chapter" data-level="3.1" data-path="glm.html"><a href="glm.html#components-of-glm"><i class="fa fa-check"></i><b>3.1</b> 广义线性模型的构成部分</a></li>
<li class="chapter" data-level="3.2" data-path="glm.html"><a href="glm.html#glm-for-binary-data"><i class="fa fa-check"></i><b>3.2</b> 二分数据的广义线性模型</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#打鼾与心脏病"><i class="fa fa-check"></i>打鼾与心脏病</a></li>
</ul></li>
<li class="chapter" data-level="3.3" data-path="glm.html"><a href="glm.html#glm-for-count-data"><i class="fa fa-check"></i><b>3.3</b> 计数数据的广义线性模型</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#母鲎及其追随者泊松glm"><i class="fa fa-check"></i>母鲎及其追随者（泊松GLM）</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#母鲎及其追随者负二项glm"><i class="fa fa-check"></i>母鲎及其追随者（负二项GLM）</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#英国的火车事故"><i class="fa fa-check"></i>英国的火车事故</a></li>
</ul></li>
<li class="chapter" data-level="3.4" data-path="glm.html"><a href="glm.html#stat-infer-glm"><i class="fa fa-check"></i><b>3.4</b> 统计推断和模型检验</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#打鼾与心脏病-1"><i class="fa fa-check"></i>打鼾与心脏病</a></li>
</ul></li>
<li class="chapter" data-level="3.5" data-path="glm.html"><a href="glm.html#fit-glm"><i class="fa fa-check"></i><b>3.5</b> 广义线性模型的拟合</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#problems-ch3"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第3题"><i class="fa fa-check"></i>第3题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第4题-1"><i class="fa fa-check"></i>第4题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第7题"><i class="fa fa-check"></i>第7题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第20题"><i class="fa fa-check"></i>第20题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="4" data-path="logistic-regression.html"><a href="logistic-regression.html"><i class="fa fa-check"></i><b>4</b> logistic回归</a><ul>
<li class="chapter" data-level="4.1" data-path="logistic-regression.html"><a href="logistic-regression.html#interpret-logistic"><i class="fa fa-check"></i><b>4.1</b> logistic回归模型的解释</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#母鲎及其追随者logistic回归"><i class="fa fa-check"></i>母鲎及其追随者（logistic回归）</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="logistic-regression.html"><a href="logistic-regression.html#infer-logistic"><i class="fa fa-check"></i><b>4.2</b> logistic回归的推断</a></li>
<li class="chapter" data-level="4.3" data-path="logistic-regression.html"><a href="logistic-regression.html#cate-var-logistic"><i class="fa fa-check"></i><b>4.3</b> 属性预测变量的logistic回归</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#azt和aids"><i class="fa fa-check"></i>AZT和AIDS</a></li>
</ul></li>
<li class="chapter" data-level="4.4" data-path="logistic-regression.html"><a href="logistic-regression.html#multi-logistic"><i class="fa fa-check"></i><b>4.4</b> 多元logistic回归</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#母鲎及其追随者多元logistic"><i class="fa fa-check"></i>母鲎及其追随者（多元logistic）</a></li>
</ul></li>
<li class="chapter" data-level="4.5" data-path="logistic-regression.html"><a href="logistic-regression.html#logistic回归效应的概括"><i class="fa fa-check"></i><b>4.5</b> logistic回归效应的概括</a></li>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#problem-ch4"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#第8题"><i class="fa fa-check"></i>第8题</a></li>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#第24题"><i class="fa fa-check"></i>第24题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="5" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html"><i class="fa fa-check"></i><b>5</b> logistic回归模型的构建和应用</a><ul>
<li class="chapter" data-level="5.1" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#model-selection"><i class="fa fa-check"></i><b>5.1</b> 模型选择策略</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者模型选择"><i class="fa fa-check"></i>母鲎及其追随者（模型选择）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者预测功效"><i class="fa fa-check"></i>母鲎及其追随者（预测功效）</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#model-checking"><i class="fa fa-check"></i><b>5.2</b> 模型检验</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者模型lr检验"><i class="fa fa-check"></i>母鲎及其追随者（模型LR检验）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#azt和aids拟合优度"><i class="fa fa-check"></i>AZT和AIDS（拟合优度）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者hm检验"><i class="fa fa-check"></i>母鲎及其追随者（HM检验）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#佛罗里达大学研究生入学"><i class="fa fa-check"></i>佛罗里达大学研究生入学</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#心脏病与血压的关系"><i class="fa fa-check"></i>心脏病与血压的关系</a></li>
</ul></li>
<li class="chapter" data-level="5.3" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#sparse-data-logistic"><i class="fa fa-check"></i><b>5.3</b> 稀疏数据效应</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#稀疏数据的临床试验结果"><i class="fa fa-check"></i>稀疏数据的临床试验结果</a></li>
</ul></li>
<li class="chapter" data-level="5.4" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#conditional-logistic"><i class="fa fa-check"></i><b>5.4</b> 条件logistic回归与精确推断</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#晋升能力"><i class="fa fa-check"></i>晋升能力</a></li>
</ul></li>
<li class="chapter" data-level="5.5" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#logistic-sample-num"><i class="fa fa-check"></i><b>5.5</b> logistic回归的样本量与功效</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#样本量计算"><i class="fa fa-check"></i>样本量计算</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#problem-ch5"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第10题"><i class="fa fa-check"></i>第10题</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第18题-1"><i class="fa fa-check"></i>第18题</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第28题"><i class="fa fa-check"></i>第28题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="6" data-path="multi-logit-model.html"><a href="multi-logit-model.html"><i class="fa fa-check"></i><b>6</b> 多类别logit模型</a><ul>
<li class="chapter" data-level="6.1" data-path="multi-logit-model.html"><a href="multi-logit-model.html#nomial-logit"><i class="fa fa-check"></i><b>6.1</b> 名义响应变量的logit模型</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#钝吻鳄食物选择"><i class="fa fa-check"></i>钝吻鳄食物选择</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#是否相信来世"><i class="fa fa-check"></i>是否相信来世</a></li>
</ul></li>
<li class="chapter" data-level="6.2" data-path="multi-logit-model.html"><a href="multi-logit-model.html#ordinal-logit"><i class="fa fa-check"></i><b>6.2</b> 有序响应变量的累积logit模型</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#政治意识形态和隶属党派的关系"><i class="fa fa-check"></i>政治意识形态和隶属党派的关系</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#对心理健康建模"><i class="fa fa-check"></i>对心理健康建模</a></li>
</ul></li>
<li class="chapter" data-level="6.3" data-path="multi-logit-model.html"><a href="multi-logit-model.html#paired-ordinal-logit"><i class="fa fa-check"></i><b>6.3</b> 成对类别有序logit</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#再访政治意识形态"><i class="fa fa-check"></i>再访政治意识形态</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#发育毒性研究"><i class="fa fa-check"></i>发育毒性研究</a></li>
</ul></li>
<li class="chapter" data-level="6.4" data-path="multi-logit-model.html"><a href="multi-logit-model.html#conditional-independent"><i class="fa fa-check"></i><b>6.4</b> 条件独立性检验</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#工作满意度和收入"><i class="fa fa-check"></i>工作满意度和收入</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#ch6-problems"><i class="fa fa-check"></i>课后题</a></li>
</ul></li>
<li class="appendix"><span><b>附录</b></span></li>
<li class="chapter" data-level="A" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html"><i class="fa fa-check"></i><b>A</b> 配套R包使用介绍</a><ul>
<li class="chapter" data-level="A.1" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html#r-pkg-install"><i class="fa fa-check"></i><b>A.1</b> 安装</a></li>
<li class="chapter" data-level="A.2" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html#r-pkg-use"><i class="fa fa-check"></i><b>A.2</b> 使用说明</a></li>
</ul></li>
<li class="chapter" data-level="B" data-path="book-dataset-list.html"><a href="book-dataset-list.html"><i class="fa fa-check"></i><b>B</b> 教材数据列表</a><ul>
<li class="chapter" data-level="B.1" data-path="book-dataset-list.html"><a href="book-dataset-list.html#正文案例数据"><i class="fa fa-check"></i><b>B.1</b> 正文案例数据</a></li>
<li class="chapter" data-level="B.2" data-path="book-dataset-list.html"><a href="book-dataset-list.html#习题数据"><i class="fa fa-check"></i><b>B.2</b> 习题数据</a></li>
</ul></li>
<li class="divider"></li>
<li><a href="https://bookdown.org" target="blank">本书由 bookdown 强力驱动</a></li>

</ul>

      </nav>
    </div>

    <div class="book-body">
      <div class="body-inner">
        <div class="book-header" role="navigation">
          <h1>
            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">《属性数据分析》代码</a>
          </h1>
        </div>

        <div class="page-wrapper" tabindex="-1" role="main">
          <div class="page-inner">

            <section class="normal" id="section-">
<div id="build-and-apply-logistic-model" class="section level1">
<h1><span class="header-section-number">第 5 章</span> logistic回归模型的构建和应用</h1>
<div id="model-selection" class="section level2">
<h2><span class="header-section-number">5.1</span> 模型选择策略</h2>
<div id="母鲎及其追随者模型选择" class="section level3 unnumbered">
<h3>母鲎及其追随者（模型选择）</h3>
<p>该案例中，最开始的模型包含了重量、宽度、棘刺和颜色四个因素，其中棘刺和颜色是因子型变量。在<code>horseshoecrabs</code>数据集中这两者均为数值型，需要先进行转换。</p>
<div class="sourceCode" id="cb257"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb257-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb257-2" data-line-number="2"><span class="kw">library</span>(dplyr)</a>
<a class="sourceLine" id="cb257-3" data-line-number="3"><span class="kw">data</span>(horseshoecrabs)</a>
<a class="sourceLine" id="cb257-4" data-line-number="4">horseshoecrabs &lt;-<span class="st"> </span>horseshoecrabs <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb257-5" data-line-number="5"><span class="st">  </span><span class="kw">mutate</span>(</a>
<a class="sourceLine" id="cb257-6" data-line-number="6">    <span class="dt">psat =</span> <span class="kw">as.integer</span>(horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>),  <span class="co"># psat为是否有追随者</span></a>
<a class="sourceLine" id="cb257-7" data-line-number="7">    <span class="dt">Spine_factor =</span> <span class="kw">factor</span>(Spine, <span class="dt">levels =</span> <span class="dv">3</span><span class="op">:</span><span class="dv">1</span>),  <span class="co"># 棘刺分组，棘刺3为基准</span></a>
<a class="sourceLine" id="cb257-8" data-line-number="8">    <span class="dt">Color_factor =</span> <span class="kw">factor</span>(Color, <span class="dt">levels =</span> <span class="dv">4</span><span class="op">:</span><span class="dv">1</span>)  <span class="co"># 颜色分组，颜色4为基准</span></a>
<a class="sourceLine" id="cb257-9" data-line-number="9">  )</a>
<a class="sourceLine" id="cb257-10" data-line-number="10"></a>
<a class="sourceLine" id="cb257-11" data-line-number="11">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb257-12" data-line-number="12">  psat <span class="op">~</span><span class="st"> </span>Weight <span class="op">+</span><span class="st"> </span>Width <span class="op">+</span><span class="st"> </span>Spine_factor <span class="op">+</span><span class="st"> </span>Color_factor, </a>
<a class="sourceLine" id="cb257-13" data-line-number="13">  <span class="dt">family =</span> <span class="kw">binomial</span>(), <span class="dt">data =</span> horseshoecrabs</a>
<a class="sourceLine" id="cb257-14" data-line-number="14">)</a>
<a class="sourceLine" id="cb257-15" data-line-number="15"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Weight + Width + Spine_factor + Color_factor, 
##     family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.198  -0.942   0.485   0.849   2.120  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(&gt;|z|)   
## (Intercept)     -9.273      3.838   -2.42   0.0157 * 
## Weight           0.826      0.704    1.17   0.2407   
## Width            0.263      0.195    1.35   0.1779   
## Spine_factor2   -0.496      0.629   -0.79   0.4302   
## Spine_factor1   -0.400      0.503   -0.80   0.4259   
## Color_factor3    1.120      0.593    1.89   0.0591 . 
## Color_factor2    1.506      0.567    2.66   0.0079 **
## Color_factor1    1.609      0.936    1.72   0.0855 . 
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 185.20  on 165  degrees of freedom
## AIC: 201.2
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<div class="sourceCode" id="cb259"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb259-1" data-line-number="1">m2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb259-2" data-line-number="2">  psat <span class="op">~</span><span class="st"> </span>Weight <span class="op">+</span><span class="st"> </span>Width <span class="op">+</span><span class="st"> </span>Spine_factor <span class="op">+</span><span class="st"> </span>Color_factor <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb259-3" data-line-number="3"><span class="st">    </span>Color_factor <span class="op">*</span><span class="st"> </span>Spine_factor <span class="op">+</span><span class="st"> </span>Width <span class="op">*</span><span class="st"> </span>Color_factor <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb259-4" data-line-number="4"><span class="st">    </span>Width <span class="op">*</span><span class="st"> </span>Spine_factor, </a>
<a class="sourceLine" id="cb259-5" data-line-number="5">  <span class="dt">family =</span> <span class="kw">binomial</span>(), </a>
<a class="sourceLine" id="cb259-6" data-line-number="6">  <span class="dt">data =</span> horseshoecrabs</a>
<a class="sourceLine" id="cb259-7" data-line-number="7">)</a>
<a class="sourceLine" id="cb259-8" data-line-number="8"><span class="kw">summary</span>(m2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Weight + Width + Spine_factor + Color_factor + 
##     Color_factor * Spine_factor + Width * Color_factor + Width * 
##     Spine_factor, family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.176  -0.885   0.458   0.773   1.923  
## 
## Coefficients:
##                              Estimate Std. Error z value
## (Intercept)                 -6.48e-01   7.66e+00   -0.08
## Weight                       1.04e+00   7.54e-01    1.38
## Width                       -8.79e-02   3.26e-01   -0.27
## Spine_factor2               -1.72e+01   3.96e+03    0.00
## Spine_factor1               -1.80e+01   3.96e+03    0.00
## Color_factor3               -1.61e+01   1.00e+01   -1.61
## Color_factor2               -3.14e+00   8.85e+00   -0.35
## Color_factor1               -2.11e+01   3.96e+03   -0.01
## Spine_factor2:Color_factor3  1.63e+01   3.96e+03    0.00
## Spine_factor1:Color_factor3  3.34e+01   4.48e+03    0.01
## Spine_factor2:Color_factor2  1.57e+01   3.96e+03    0.00
## Spine_factor1:Color_factor2  1.69e+01   3.96e+03    0.00
## Spine_factor2:Color_factor1  5.27e+01   6.25e+03    0.01
## Spine_factor1:Color_factor1  3.61e+01   5.59e+03    0.01
## Width:Color_factor3          6.70e-01   3.94e-01    1.70
## Width:Color_factor2          1.84e-01   3.43e-01    0.54
## Width:Color_factor1          1.45e-01   7.88e-01    0.18
## Width:Spine_factor2          9.81e-03   6.70e-01    0.01
## Width:Spine_factor1          1.77e-02   2.89e-01    0.06
##                             Pr(&gt;|z|)  
## (Intercept)                    0.933  
## Weight                         0.167  
## Width                          0.788  
## Spine_factor2                  0.997  
## Spine_factor1                  0.996  
## Color_factor3                  0.108  
## Color_factor2                  0.723  
## Color_factor1                  0.996  
## Spine_factor2:Color_factor3    0.997  
## Spine_factor1:Color_factor3    0.994  
## Spine_factor2:Color_factor2    0.997  
## Spine_factor1:Color_factor2    0.997  
## Spine_factor2:Color_factor1    0.993  
## Spine_factor1:Color_factor1    0.995  
## Width:Color_factor3            0.089 .
## Width:Color_factor2            0.591  
## Width:Color_factor1            0.854  
## Width:Spine_factor2            0.988  
## Width:Spine_factor1            0.951  
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 171.66  on 154  degrees of freedom
## AIC: 209.7
## 
## Number of Fisher Scoring iterations: 16</code></pre>
<div class="sourceCode" id="cb261"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb261-1" data-line-number="1"><span class="kw">anova</span>(m2)</a></code></pre></div>
<pre><code>## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: psat
## 
## Terms added sequentially (first to last)
## 
## 
##                           Df Deviance Resid. Df Resid. Dev
## NULL                                        172        226
## Weight                     1    30.02       171        196
## Width                      1     2.85       170        193
## Spine_factor               2     0.09       168        193
## Color_factor               3     7.60       165        185
## Spine_factor:Color_factor  6     9.61       159        176
## Width:Color_factor         3     3.93       156        172
## Width:Spine_factor         2     0.00       154        172</code></pre>
<div class="sourceCode" id="cb263"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb263-1" data-line-number="1"><span class="co"># 向后逐步回归</span></a>
<a class="sourceLine" id="cb263-2" data-line-number="2">m2_backward &lt;-<span class="st"> </span><span class="kw">step</span>(m2, <span class="dt">direction =</span> <span class="st">&quot;backward&quot;</span>, <span class="dt">trace =</span> <span class="ot">FALSE</span>)</a>
<a class="sourceLine" id="cb263-3" data-line-number="3">m2_backward</a></code></pre></div>
<pre><code>## 
## Call:  glm(formula = psat ~ Width + Color_factor, family = binomial(), 
##     data = horseshoecrabs)
## 
## Coefficients:
##   (Intercept)          Width  Color_factor3  Color_factor2  
##       -12.715          0.468          1.106          1.402  
## Color_factor1  
##         1.330  
## 
## Degrees of Freedom: 172 Total (i.e. Null);  168 Residual
## Null Deviance:       226 
## Residual Deviance: 187   AIC: 197</code></pre>
<div class="sourceCode" id="cb265"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb265-1" data-line-number="1"><span class="co"># 双向逐步回归</span></a>
<a class="sourceLine" id="cb265-2" data-line-number="2">m2_step &lt;-<span class="st"> </span><span class="kw">step</span>(m2, <span class="dt">trace =</span> <span class="ot">FALSE</span>) </a>
<a class="sourceLine" id="cb265-3" data-line-number="3"><span class="kw">summary</span>(m2_step)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Width + Color_factor, family = binomial(), 
##     data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.112  -0.985   0.524   0.851   2.141  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)    -12.715      2.762   -4.60  4.1e-06 ***
## Width            0.468      0.106    4.43  9.3e-06 ***
## Color_factor3    1.106      0.592    1.87    0.062 .  
## Color_factor2    1.402      0.548    2.56    0.011 *  
## Color_factor1    1.330      0.853    1.56    0.119    
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 187.46  on 168  degrees of freedom
## AIC: 197.5
## 
## Number of Fisher Scoring iterations: 4</code></pre>
</div>
<div id="母鲎及其追随者预测功效" class="section level3 unnumbered">
<h3>母鲎及其追随者（预测功效）</h3>
<p>首先得到模型</p>
<div class="sourceCode" id="cb267"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb267-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb267-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>)</a>
<a class="sourceLine" id="cb267-3" data-line-number="3">horseshoecrabs<span class="op">$</span>psat &lt;-<span class="st"> </span><span class="kw">as.integer</span>(horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>)</a>
<a class="sourceLine" id="cb267-4" data-line-number="4">m &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb267-5" data-line-number="5">  psat <span class="op">~</span><span class="st"> </span><span class="kw">factor</span>(Color) <span class="op">+</span><span class="st"> </span>Width, </a>
<a class="sourceLine" id="cb267-6" data-line-number="6">  <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb267-7" data-line-number="7">)</a></code></pre></div>
<p>然后就可以获取混淆矩阵（交叉分类表）</p>
<div class="sourceCode" id="cb268"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb268-1" data-line-number="1">pi0 &lt;-<span class="st"> </span><span class="fl">0.5</span>  <span class="co"># cut-off value</span></a>
<a class="sourceLine" id="cb268-2" data-line-number="2">pred_prob &lt;-<span class="st"> </span><span class="kw">predict</span>(m, <span class="dt">type =</span> <span class="st">&quot;response&quot;</span>)</a>
<a class="sourceLine" id="cb268-3" data-line-number="3">pred_type &lt;-<span class="st"> </span><span class="kw">cut</span>(</a>
<a class="sourceLine" id="cb268-4" data-line-number="4">  pred_prob, <span class="dt">breaks =</span> <span class="kw">c</span>(<span class="dv">0</span>, pi0, <span class="dv">1</span>), <span class="dt">labels =</span> <span class="dv">0</span><span class="op">:</span><span class="dv">1</span>, </a>
<a class="sourceLine" id="cb268-5" data-line-number="5">  <span class="dt">include.lowest =</span> <span class="ot">TRUE</span></a>
<a class="sourceLine" id="cb268-6" data-line-number="6">)</a>
<a class="sourceLine" id="cb268-7" data-line-number="7"><span class="kw">table</span>(horseshoecrabs<span class="op">$</span>psat, pred_type)</a></code></pre></div>
<pre><code>##    pred_type
##      0  1
##   0 31 31
##   1 15 96</code></pre>
<p><strong>这里结果和书上不大一样，原因不明</strong></p>
<p>画ROC曲线和计算AUC可以使用<code>ROCR</code>包中的<code>performance</code></p>
<div class="sourceCode" id="cb270"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb270-1" data-line-number="1"><span class="kw">library</span>(ROCR)</a>
<a class="sourceLine" id="cb270-2" data-line-number="2"><span class="kw">par</span>(<span class="dt">pty =</span> <span class="st">&quot;s&quot;</span>)</a>
<a class="sourceLine" id="cb270-3" data-line-number="3">pred &lt;-<span class="st"> </span><span class="kw">prediction</span>(<span class="kw">fitted</span>(m2_step), horseshoecrabs<span class="op">$</span>psat)</a>
<a class="sourceLine" id="cb270-4" data-line-number="4">perf &lt;-<span class="st"> </span><span class="kw">performance</span>(pred, <span class="st">&quot;tpr&quot;</span>, <span class="st">&quot;fpr&quot;</span>)</a>
<a class="sourceLine" id="cb270-5" data-line-number="5"><span class="kw">plot</span>(perf, <span class="dt">asp =</span><span class="dv">1</span>, <span class="dt">xaxs=</span><span class="st">&quot;i&quot;</span>, <span class="dt">yaxs=</span><span class="st">&quot;i&quot;</span>)</a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-108-1.png" width="70%" style="display: block; margin: auto;" /></p>
<div class="sourceCode" id="cb271"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb271-1" data-line-number="1"><span class="kw">performance</span>(pred,<span class="st">&quot;auc&quot;</span>)<span class="op">@</span>y.values[[<span class="dv">1</span>]]</a></code></pre></div>
<pre><code>## [1] 0.7714</code></pre>
<p>或者也可以通过<code>pROC</code>包的<code>roc</code>函数，有更多可调节的作图选项（可查看<code>help(plot.roc)</code>）</p>
<div class="sourceCode" id="cb273"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb273-1" data-line-number="1"><span class="kw">library</span>(pROC)</a>
<a class="sourceLine" id="cb273-2" data-line-number="2"><span class="kw">par</span>(<span class="dt">pty =</span> <span class="st">&quot;s&quot;</span>)</a>
<a class="sourceLine" id="cb273-3" data-line-number="3">result &lt;-<span class="st"> </span><span class="kw">roc</span>(</a>
<a class="sourceLine" id="cb273-4" data-line-number="4">  horseshoecrabs<span class="op">$</span>psat,</a>
<a class="sourceLine" id="cb273-5" data-line-number="5">  <span class="kw">predict</span>(m2_step, <span class="dt">type =</span> <span class="st">&quot;response&quot;</span>),</a>
<a class="sourceLine" id="cb273-6" data-line-number="6">  <span class="dt">plot =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb273-7" data-line-number="7">  <span class="dt">auc.polygon =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb273-8" data-line-number="8">  <span class="dt">grid =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb273-9" data-line-number="9">  <span class="dt">asp =</span><span class="dv">1</span>,</a>
<a class="sourceLine" id="cb273-10" data-line-number="10">  <span class="dt">xaxs=</span><span class="st">&quot;i&quot;</span>,</a>
<a class="sourceLine" id="cb273-11" data-line-number="11">  <span class="dt">yaxs=</span><span class="st">&quot;i&quot;</span></a>
<a class="sourceLine" id="cb273-12" data-line-number="12">)</a>
<a class="sourceLine" id="cb273-13" data-line-number="13"><span class="kw">text</span>(<span class="fl">0.3</span>, <span class="fl">0.3</span>, <span class="dt">labels =</span> <span class="kw">paste</span>(<span class="st">&quot;AUC =&quot;</span>, <span class="kw">round</span>(result<span class="op">$</span>auc, <span class="dv">4</span>)), <span class="dt">cex =</span> <span class="fl">1.3</span>)</a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-109-1.png" width="70%" style="display: block; margin: auto;" /></p>
<p>最后是计算真实分类与预测概率的相关性</p>
<div class="sourceCode" id="cb274"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb274-1" data-line-number="1"><span class="kw">cor</span>(horseshoecrabs<span class="op">$</span>psat, <span class="kw">fitted</span>(m))</a></code></pre></div>
<pre><code>## [1] 0.4522</code></pre>
</div>
</div>
<div id="model-checking" class="section level2">
<h2><span class="header-section-number">5.2</span> 模型检验</h2>
<div id="母鲎及其追随者模型lr检验" class="section level3 unnumbered">
<h3>母鲎及其追随者（模型LR检验）</h3>
<p>以下是对是否有需要宽度的二次项进行的LR检验</p>
<div class="sourceCode" id="cb276"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb276-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb276-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>)</a>
<a class="sourceLine" id="cb276-3" data-line-number="3"><span class="co"># 分别拟合出没有二次项和有二次项的模型</span></a>
<a class="sourceLine" id="cb276-4" data-line-number="4">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb276-5" data-line-number="5">  Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span> <span class="op">~</span><span class="st"> </span>Width, </a>
<a class="sourceLine" id="cb276-6" data-line-number="6">  <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb276-7" data-line-number="7">)</a>
<a class="sourceLine" id="cb276-8" data-line-number="8">m2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb276-9" data-line-number="9">  Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span> <span class="op">~</span><span class="st"> </span>Width <span class="op">+</span><span class="st"> </span><span class="kw">I</span>(Width <span class="op">^</span><span class="st"> </span><span class="dv">2</span>), </a>
<a class="sourceLine" id="cb276-10" data-line-number="10">  <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb276-11" data-line-number="11">)</a>
<a class="sourceLine" id="cb276-12" data-line-number="12"></a>
<a class="sourceLine" id="cb276-13" data-line-number="13"><span class="co"># 查看二次项系数</span></a>
<a class="sourceLine" id="cb276-14" data-line-number="14"><span class="kw">summary</span>(m2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = Satellites &gt; 0 ~ Width + I(Width^2), family = binomial(), 
##     data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.119  -1.044   0.507   0.948   1.541  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)
## (Intercept)  14.5916    30.2237    0.48     0.63
## Width        -1.5957     2.3520   -0.68     0.50
## I(Width^2)    0.0405     0.0457    0.89     0.38
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 193.63  on 170  degrees of freedom
## AIC: 199.6
## 
## Number of Fisher Scoring iterations: 5</code></pre>
<div class="sourceCode" id="cb278"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb278-1" data-line-number="1"><span class="co"># 对比两个模型（似然比检验）</span></a>
<a class="sourceLine" id="cb278-2" data-line-number="2"><span class="kw">anova</span>(m1, m2, <span class="dt">test =</span> <span class="st">&quot;LR&quot;</span>)</a></code></pre></div>
<pre><code>## Analysis of Deviance Table
## 
## Model 1: Satellites &gt; 0 ~ Width
## Model 2: Satellites &gt; 0 ~ Width + I(Width^2)
##   Resid. Df Resid. Dev Df Deviance Pr(&gt;Chi)
## 1       171        194                     
## 2       170        194  1    0.825     0.36</code></pre>
</div>
<div id="azt和aids拟合优度" class="section level3 unnumbered">
<h3>AZT和AIDS（拟合优度）</h3>
<p>对列联表进行logistic回归有两种方法，一种是转换为数据框进行回归，使用<code>weights=data$Freq</code>设定次数，
另一种是直接使用列联表的表格进行回归。而要进行X2和G2的拟合优度检验，最好使用第二种方法</p>
<div class="sourceCode" id="cb280"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb280-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb280-2" data-line-number="2"><span class="kw">library</span>(tidyr)</a>
<a class="sourceLine" id="cb280-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;AZT&quot;</span>)</a>
<a class="sourceLine" id="cb280-4" data-line-number="4">AZT_df &lt;-<span class="st"> </span><span class="kw">spread</span>(<span class="kw">as.data.frame</span>(AZT), Symptoms, Freq)</a>
<a class="sourceLine" id="cb280-5" data-line-number="5">AZT_df</a></code></pre></div>
<pre><code>##    Race AZTUse Yes No
## 1 White    Yes  14 93
## 2 White     No  32 81
## 3 Black    Yes  11 52
## 4 Black     No  12 43</code></pre>
<div class="sourceCode" id="cb282"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb282-1" data-line-number="1">m &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb282-2" data-line-number="2">  <span class="kw">cbind</span>(Yes, No) <span class="op">~</span><span class="st"> </span>(Race <span class="op">==</span><span class="st"> &quot;White&quot;</span>) <span class="op">+</span><span class="st"> </span>(AZTUse <span class="op">==</span><span class="st"> &quot;Yes&quot;</span>), </a>
<a class="sourceLine" id="cb282-3" data-line-number="3">  <span class="dt">data =</span> AZT_df, </a>
<a class="sourceLine" id="cb282-4" data-line-number="4">  <span class="dt">family =</span> <span class="kw">binomial</span>(<span class="st">&quot;logit&quot;</span>)</a>
<a class="sourceLine" id="cb282-5" data-line-number="5">)</a>
<a class="sourceLine" id="cb282-6" data-line-number="6"><span class="kw">summary</span>(m)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = cbind(Yes, No) ~ (Race == &quot;White&quot;) + (AZTUse == 
##     &quot;Yes&quot;), family = binomial(&quot;logit&quot;), data = AZT_df)
## 
## Deviance Residuals: 
##      1       2       3       4  
## -0.555   0.425   0.704  -0.633  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(&gt;|z|)
## (Intercept)          -1.0736     0.2629   -4.08  4.4e-05
## Race == &quot;White&quot;TRUE   0.0555     0.2886    0.19   0.8476
## AZTUse == &quot;Yes&quot;TRUE  -0.7195     0.2790   -2.58   0.0099
##                        
## (Intercept)         ***
## Race == &quot;White&quot;TRUE    
## AZTUse == &quot;Yes&quot;TRUE ** 
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8.3499  on 3  degrees of freedom
## Residual deviance: 1.3835  on 1  degrees of freedom
## AIC: 24.86
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<div class="sourceCode" id="cb284"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb284-1" data-line-number="1"><span class="co"># X2和G2的自由度</span></a>
<a class="sourceLine" id="cb284-2" data-line-number="2">df &lt;-<span class="st"> </span><span class="kw">nrow</span>(AZT_df) <span class="op">-</span><span class="st"> </span><span class="kw">length</span>(<span class="kw">coef</span>(m))</a>
<a class="sourceLine" id="cb284-3" data-line-number="3"></a>
<a class="sourceLine" id="cb284-4" data-line-number="4"><span class="co"># X2检验</span></a>
<a class="sourceLine" id="cb284-5" data-line-number="5">X2 &lt;-<span class="st"> </span><span class="kw">sum</span>(<span class="kw">resid</span>(m, <span class="dt">type =</span> <span class="st">&quot;pearson&quot;</span>) <span class="op">^</span><span class="st"> </span><span class="dv">2</span>)</a>
<a class="sourceLine" id="cb284-6" data-line-number="6">x2_pvalue &lt;-<span class="st"> </span><span class="dv">1</span><span class="op">-</span><span class="st"> </span><span class="kw">pchisq</span>(X2, df)</a>
<a class="sourceLine" id="cb284-7" data-line-number="7"><span class="kw">c</span>(<span class="dt">X2 =</span> X2, <span class="dt">pvalue =</span> x2_pvalue)</a></code></pre></div>
<pre><code>##     X2 pvalue 
## 1.3910 0.2382</code></pre>
<div class="sourceCode" id="cb286"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb286-1" data-line-number="1"><span class="co"># G2检验</span></a>
<a class="sourceLine" id="cb286-2" data-line-number="2">G2 &lt;-<span class="st"> </span><span class="kw">sum</span>(<span class="kw">resid</span>(m, <span class="dt">type =</span> <span class="st">&quot;deviance&quot;</span>) <span class="op">^</span><span class="st"> </span><span class="dv">2</span>)</a>
<a class="sourceLine" id="cb286-3" data-line-number="3">g2_pvalue &lt;-<span class="st"> </span><span class="dv">1</span> <span class="op">-</span><span class="st"> </span><span class="kw">pchisq</span>(G2, df)</a>
<a class="sourceLine" id="cb286-4" data-line-number="4"><span class="kw">c</span>(<span class="dt">G2 =</span> G2, <span class="dt">pvalue =</span> g2_pvalue)</a></code></pre></div>
<pre><code>##     G2 pvalue 
## 1.3835 0.2395</code></pre>
</div>
<div id="母鲎及其追随者hm检验" class="section level3 unnumbered">
<h3>母鲎及其追随者（HM检验）</h3>
<p>Hosmer–Lemeshow检验可使用<code>ResourceSelection</code>包中的<code>hoslem.test()</code>来得到</p>
<div class="sourceCode" id="cb288"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb288-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb288-2" data-line-number="2"><span class="kw">library</span>(ResourceSelection)</a>
<a class="sourceLine" id="cb288-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>)</a>
<a class="sourceLine" id="cb288-4" data-line-number="4">horseshoecrabs<span class="op">$</span>psat &lt;-<span class="st"> </span><span class="kw">as.integer</span>(horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>)</a>
<a class="sourceLine" id="cb288-5" data-line-number="5">m &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb288-6" data-line-number="6">  psat <span class="op">~</span><span class="st"> </span><span class="kw">factor</span>(Color) <span class="op">+</span><span class="st"> </span>Width, </a>
<a class="sourceLine" id="cb288-7" data-line-number="7">  <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb288-8" data-line-number="8">)</a>
<a class="sourceLine" id="cb288-9" data-line-number="9"></a>
<a class="sourceLine" id="cb288-10" data-line-number="10"><span class="co"># Hosmer–Lemeshow test</span></a>
<a class="sourceLine" id="cb288-11" data-line-number="11"><span class="kw">hoslem.test</span>(m<span class="op">$</span>y, <span class="kw">fitted</span>(m))</a></code></pre></div>
<pre><code>## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  m$y, fitted(m)
## X-squared = 4.5, df = 8, p-value = 0.8</code></pre>
</div>
<div id="佛罗里达大学研究生入学" class="section level3 unnumbered">
<h3>佛罗里达大学研究生入学</h3>
<div class="sourceCode" id="cb290"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb290-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb290-2" data-line-number="2"><span class="kw">library</span>(tidyr)</a>
<a class="sourceLine" id="cb290-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;UFAdmissions&quot;</span>)</a>
<a class="sourceLine" id="cb290-4" data-line-number="4">UFAdmissions_df &lt;-<span class="st"> </span><span class="kw">spread</span>(<span class="kw">as.data.frame</span>(UFAdmissions), Decision, Freq)</a>
<a class="sourceLine" id="cb290-5" data-line-number="5">UFAdmissions_df</a></code></pre></div>
<pre><code>##    Dept Gender Admitted Rejected
## 1  anth Female       32       81
## 2  anth   Male       21       41
## 3  astr Female        6        0
## 4  astr   Male        3        8
## 5  chem Female       12       43
## 6  chem   Male       34      110
## 7  clas Female        3        1
## 8  clas   Male        4        0
## 9  comm Female       52      149
## 10 comm   Male        5       10
## 11 comp Female        8        7
## 12 comp   Male        6       12
## 13 engl Female       35      100
## 14 engl   Male       30      112
## 15 geog Female        9        1
## 16 geog   Male       11       11
## 17 geol Female        6        3
## 18 geol   Male       15        6
## 19 germ Female       17        0
## 20 germ   Male        4        1
## 21 hist Female        9        9
## 22 hist   Male       21       19
## 23 lati Female       26        7
## 24 lati   Male       25       16
## 25 ling Female       21       10
## 26 ling   Male        7        8
## 27 math Female       25       18
## 28 math   Male       31       37
## 29 phil Female        3        0
## 30 phil   Male        9        6
## 31 phys Female       10       11
## 32 phys   Male       25       53
## 33 poli Female       25       34
## 34 poli   Male       39       49
## 35 psyc Female        2      123
## 36 psyc   Male        4       41
## 37 reli Female        3        3
## 38 reli   Male        0        2
## 39 roma Female       29       13
## 40 roma   Male        6        3
## 41 soci Female       16       33
## 42 soci   Male        7       17
## 43 stat Female       23        9
## 44 stat   Male       36       14
## 45 zool Female        4       62
## 46 zool   Male       10       54</code></pre>
<div class="sourceCode" id="cb292"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb292-1" data-line-number="1">m &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb292-2" data-line-number="2">  <span class="kw">cbind</span>(Admitted, Rejected) <span class="op">~</span><span class="st"> </span>Dept, </a>
<a class="sourceLine" id="cb292-3" data-line-number="3">  <span class="dt">data =</span> UFAdmissions_df, </a>
<a class="sourceLine" id="cb292-4" data-line-number="4">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb292-5" data-line-number="5">)</a>
<a class="sourceLine" id="cb292-6" data-line-number="6"><span class="kw">summary</span>(m)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = cbind(Admitted, Rejected) ~ Dept, family = binomial(), 
##     data = UFAdmissions_df)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.728  -0.653  -0.001   0.762   2.763  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)  -0.8337     0.1645   -5.07  4.0e-07 ***
## Deptastr      0.9515     0.5130    1.85  0.06363 .  
## Deptchem     -0.3681     0.2352   -1.56  0.11767    
## Deptclas      2.7796     1.0816    2.57  0.01017 *  
## Deptcomm     -0.1921     0.2256   -0.85  0.39444    
## Deptcomp      0.5283     0.3887    1.36  0.17411    
## Deptengl     -0.3485     0.2172   -1.60  0.10860    
## Deptgeog      1.3446     0.4005    3.36  0.00079 ***
## Deptgeol      1.6810     0.4310    3.90  9.6e-05 ***
## Deptgerm      3.8783     1.0367    3.74  0.00018 ***
## Depthist      0.9027     0.3100    2.91  0.00359 ** 
## Deptlati      1.6301     0.3003    5.43  5.7e-08 ***
## Deptling      1.2756     0.3440    3.71  0.00021 ***
## Deptmath      0.8517     0.2512    3.39  0.00070 ***
## Deptphil      1.5269     0.5264    2.90  0.00372 ** 
## Deptphys      0.2302     0.2669    0.86  0.38851    
## Deptpoli      0.5738     0.2340    2.45  0.01419 *  
## Deptpsyc     -2.4744     0.4470   -5.54  3.1e-08 ***
## Deptreli      0.3229     0.7486    0.43  0.66622    
## Deptroma      1.6165     0.3437    4.70  2.6e-06 ***
## Deptsoci      0.0572     0.3009    0.19  0.84923    
## Deptstat      1.7758     0.2958    6.00  1.9e-09 ***
## Deptzool     -1.2808     0.3273   -3.91  9.1e-05 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 449.830  on 45  degrees of freedom
## Residual deviance:  44.735  on 23  degrees of freedom
## AIC: 241.4
## 
## Number of Fisher Scoring iterations: 5</code></pre>
<div class="sourceCode" id="cb294"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb294-1" data-line-number="1"><span class="co"># X2和G2的自由度</span></a>
<a class="sourceLine" id="cb294-2" data-line-number="2">df &lt;-<span class="st"> </span><span class="kw">nrow</span>(UFAdmissions_df) <span class="op">-</span><span class="st"> </span><span class="kw">length</span>(<span class="kw">coef</span>(m))</a>
<a class="sourceLine" id="cb294-3" data-line-number="3"></a>
<a class="sourceLine" id="cb294-4" data-line-number="4"><span class="co"># X2检验</span></a>
<a class="sourceLine" id="cb294-5" data-line-number="5">X2 &lt;-<span class="st"> </span><span class="kw">sum</span>(<span class="kw">resid</span>(m, <span class="dt">type =</span> <span class="st">&quot;pearson&quot;</span>) <span class="op">^</span><span class="st"> </span><span class="dv">2</span>)</a>
<a class="sourceLine" id="cb294-6" data-line-number="6">x2_pvalue &lt;-<span class="st"> </span><span class="dv">1</span><span class="op">-</span><span class="st"> </span><span class="kw">pchisq</span>(X2, df)</a>
<a class="sourceLine" id="cb294-7" data-line-number="7"><span class="kw">c</span>(<span class="dt">X2 =</span> X2, <span class="dt">pvalue =</span> x2_pvalue)</a></code></pre></div>
<pre><code>##       X2   pvalue 
## 40.85236  0.01231</code></pre>
<div class="sourceCode" id="cb296"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb296-1" data-line-number="1"><span class="co"># G2检验</span></a>
<a class="sourceLine" id="cb296-2" data-line-number="2">G2 &lt;-<span class="st"> </span><span class="kw">sum</span>(<span class="kw">resid</span>(m, <span class="dt">type =</span> <span class="st">&quot;deviance&quot;</span>) <span class="op">^</span><span class="st"> </span><span class="dv">2</span>)</a>
<a class="sourceLine" id="cb296-3" data-line-number="3">g2_pvalue &lt;-<span class="st"> </span><span class="dv">1</span> <span class="op">-</span><span class="st"> </span><span class="kw">pchisq</span>(G2, df)</a>
<a class="sourceLine" id="cb296-4" data-line-number="4"><span class="kw">c</span>(<span class="dt">G2 =</span> G2, <span class="dt">pvalue =</span> g2_pvalue)</a></code></pre></div>
<pre><code>##        G2    pvalue 
## 44.735165  0.004282</code></pre>
</div>
<div id="心脏病与血压的关系" class="section level3 unnumbered">
<h3>心脏病与血压的关系</h3>
<div class="sourceCode" id="cb298"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb298-1" data-line-number="1"><span class="kw">library</span>(cdabookfunc)</a>
<a class="sourceLine" id="cb298-2" data-line-number="2"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb298-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;blood_pressure&quot;</span>)</a>
<a class="sourceLine" id="cb298-4" data-line-number="4">m &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb298-5" data-line-number="5">  <span class="kw">cbind</span>(ObservedDisease, SampleSize <span class="op">-</span><span class="st"> </span>ObservedDisease) <span class="op">~</span><span class="st"> </span>BloodPressure,</a>
<a class="sourceLine" id="cb298-6" data-line-number="6">  <span class="dt">data =</span> blood_pressure,</a>
<a class="sourceLine" id="cb298-7" data-line-number="7">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb298-8" data-line-number="8">)</a>
<a class="sourceLine" id="cb298-9" data-line-number="9"><span class="kw">summary</span>(m)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = cbind(ObservedDisease, SampleSize - ObservedDisease) ~ 
##     BloodPressure, family = binomial(), data = blood_pressure)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.062  -0.598  -0.225   0.214   1.850  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)   -6.08203    0.72432   -8.40   &lt;2e-16 ***
## BloodPressure  0.02434    0.00484    5.03    5e-07 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.0226  on 7  degrees of freedom
## Residual deviance:  5.9092  on 6  degrees of freedom
## AIC: 42.61
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<p>关于书上表5.6的计算，其中<code>Dfbeta</code>这一项与R函数<code>dfbeta()</code>得到的结果有些许差异，
是因为这个表是使用SAS计算得到的，而SAS计算<code>Dfbeta</code>的方式与R的不同。SAS的计算
方法详见<a href="https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sect049.htm">SAS关于logistic回归诊断的说明文档</a></p>
<p>此外表中还有一些变量在R中没法直接计算，比如<code>c</code>和<code>LR Difference</code>等，这些变量在以上说明文档中也有相应的定义。</p>
<p>而要使用SAS的方法计算以上这些变量，我在<code>cdabookcode</code>中定义了<code>dfbetas_logit_sas()</code>和<code>influence_logit_sas()</code>这两个函数，前一个使用了SAS的方法计算<code>Dfbetas</code>，而后一个计算了以上SAS说明文档中列出的所有诊断统计量。</p>
<div class="sourceCode" id="cb300"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb300-1" data-line-number="1"><span class="co"># 对比R和SAS的DFBETAS</span></a>
<a class="sourceLine" id="cb300-2" data-line-number="2">dfbetas_compare &lt;-<span class="st"> </span><span class="kw">data.frame</span>(</a>
<a class="sourceLine" id="cb300-3" data-line-number="3">  <span class="dt">R =</span> <span class="kw">dfbetas</span>(m),</a>
<a class="sourceLine" id="cb300-4" data-line-number="4">  <span class="dt">SAS =</span> <span class="kw">dfbetas_logit_sas</span>(m)</a>
<a class="sourceLine" id="cb300-5" data-line-number="5">)</a>
<a class="sourceLine" id="cb300-6" data-line-number="6">xtable<span class="op">::</span><span class="kw">xtable</span>(dfbetas_compare, <span class="dt">align =</span> <span class="st">&quot;ccccc&quot;</span>, <span class="dt">digits =</span> <span class="dv">2</span>)</a></code></pre></div>
<table>
<tr>
<th>
R..Intercept.
</th>
<th>
R.BloodPressure
</th>
<th>
SAS..Intercept.
</th>
<th>
SAS.BloodPressure
</th>
</tr>
<tr>
<td align="center">
-0.61
</td>
<td align="center">
0.56
</td>
<td align="center">
-0.53
</td>
<td align="center">
0.49
</td>
</tr>
<tr>
<td align="center">
2.50
</td>
<td align="center">
-2.24
</td>
<td align="center">
1.28
</td>
<td align="center">
-1.14
</td>
</tr>
<tr>
<td align="center">
-0.41
</td>
<td align="center">
0.34
</td>
<td align="center">
-0.39
</td>
<td align="center">
0.33
</td>
</tr>
<tr>
<td align="center">
-0.12
</td>
<td align="center">
0.08
</td>
<td align="center">
-0.12
</td>
<td align="center">
0.08
</td>
</tr>
<tr>
<td align="center">
-0.00
</td>
<td align="center">
0.01
</td>
<td align="center">
-0.00
</td>
<td align="center">
0.01
</td>
</tr>
<tr>
<td align="center">
0.05
</td>
<td align="center">
-0.06
</td>
<td align="center">
0.05
</td>
<td align="center">
-0.07
</td>
</tr>
<tr>
<td align="center">
-0.33
</td>
<td align="center">
0.38
</td>
<td align="center">
-0.35
</td>
<td align="center">
0.40
</td>
</tr>
<tr>
<td align="center">
0.10
</td>
<td align="center">
-0.11
</td>
<td align="center">
0.11
</td>
<td align="center">
-0.12
</td>
</tr>
</table>
<div class="sourceCode" id="cb301"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb301-1" data-line-number="1"><span class="co"># 计算所有诊断统计量</span></a>
<a class="sourceLine" id="cb301-2" data-line-number="2">result &lt;-<span class="st"> </span><span class="kw">influence_logit_sas</span>(m, <span class="st">&quot;data.frame&quot;</span>)</a>
<a class="sourceLine" id="cb301-3" data-line-number="3">result<span class="op">$</span><span class="st">`</span><span class="dt">dfbetas..Intercept.</span><span class="st">`</span> &lt;-<span class="st"> </span><span class="ot">NULL</span></a>
<a class="sourceLine" id="cb301-4" data-line-number="4"><span class="kw">names</span>(result) &lt;-<span class="st"> </span><span class="kw">c</span>(</a>
<a class="sourceLine" id="cb301-5" data-line-number="5">  <span class="st">&quot;hat&quot;</span>, <span class="st">&quot;pearson&quot;</span>, <span class="st">&quot;deviance&quot;</span>, <span class="st">&quot;dfbetas&quot;</span>, </a>
<a class="sourceLine" id="cb301-6" data-line-number="6">  <span class="st">&quot;c&quot;</span>, <span class="st">&quot;cbar&quot;</span>, <span class="st">&quot;difchisq&quot;</span>, <span class="st">&quot;difdev&quot;</span></a>
<a class="sourceLine" id="cb301-7" data-line-number="7">)</a>
<a class="sourceLine" id="cb301-8" data-line-number="8">xtable<span class="op">::</span><span class="kw">xtable</span>(result, <span class="dt">align =</span> <span class="st">&quot;ccccccccc&quot;</span>, <span class="dt">digits =</span> <span class="dv">2</span>)</a></code></pre></div>
<table>
<tr>
<th>
hat
</th>
<th>
pearson
</th>
<th>
deviance
</th>
<th>
dfbetas
</th>
<th>
c
</th>
<th>
cbar
</th>
<th>
difchisq
</th>
<th>
difdev
</th>
</tr>
<tr>
<td align="center">
0.22
</td>
<td align="center">
-0.98
</td>
<td align="center">
-1.06
</td>
<td align="center">
0.49
</td>
<td align="center">
0.34
</td>
<td align="center">
0.26
</td>
<td align="center">
1.22
</td>
<td align="center">
1.39
</td>
</tr>
<tr>
<td align="center">
0.29
</td>
<td align="center">
2.01
</td>
<td align="center">
1.85
</td>
<td align="center">
-1.14
</td>
<td align="center">
2.26
</td>
<td align="center">
1.62
</td>
<td align="center">
5.64
</td>
<td align="center">
5.04
</td>
</tr>
<tr>
<td align="center">
0.26
</td>
<td align="center">
-0.81
</td>
<td align="center">
-0.84
</td>
<td align="center">
0.33
</td>
<td align="center">
0.31
</td>
<td align="center">
0.23
</td>
<td align="center">
0.89
</td>
<td align="center">
0.94
</td>
</tr>
<tr>
<td align="center">
0.22
</td>
<td align="center">
-0.51
</td>
<td align="center">
-0.52
</td>
<td align="center">
0.08
</td>
<td align="center">
0.09
</td>
<td align="center">
0.07
</td>
<td align="center">
0.33
</td>
<td align="center">
0.34
</td>
</tr>
<tr>
<td align="center">
0.13
</td>
<td align="center">
0.12
</td>
<td align="center">
0.12
</td>
<td align="center">
0.01
</td>
<td align="center">
0.00
</td>
<td align="center">
0.00
</td>
<td align="center">
0.02
</td>
<td align="center">
0.02
</td>
</tr>
<tr>
<td align="center">
0.13
</td>
<td align="center">
-0.30
</td>
<td align="center">
-0.31
</td>
<td align="center">
-0.07
</td>
<td align="center">
0.02
</td>
<td align="center">
0.01
</td>
<td align="center">
0.11
</td>
<td align="center">
0.11
</td>
</tr>
<tr>
<td align="center">
0.38
</td>
<td align="center">
0.51
</td>
<td align="center">
0.50
</td>
<td align="center">
0.40
</td>
<td align="center">
0.26
</td>
<td align="center">
0.16
</td>
<td align="center">
0.43
</td>
<td align="center">
0.42
</td>
</tr>
<tr>
<td align="center">
0.38
</td>
<td align="center">
-0.14
</td>
<td align="center">
-0.14
</td>
<td align="center">
-0.12
</td>
<td align="center">
0.02
</td>
<td align="center">
0.01
</td>
<td align="center">
0.03
</td>
<td align="center">
0.03
</td>
</tr>
</table>
<div class="sourceCode" id="cb302"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb302-1" data-line-number="1"><span class="co"># 标准化pearson残差</span></a>
<a class="sourceLine" id="cb302-2" data-line-number="2"><span class="kw">round</span>(<span class="kw">rstandard</span>(m, <span class="dt">type =</span> <span class="st">&quot;pearson&quot;</span>), <span class="dv">2</span>)</a></code></pre></div>
<pre><code>##     1     2     3     4     5     6     7     8 
## -1.11  2.37 -0.95 -0.57  0.13 -0.33  0.65 -0.18</code></pre>
</div>
</div>
<div id="sparse-data-logistic" class="section level2">
<h2><span class="header-section-number">5.3</span> 稀疏数据效应</h2>
<div id="稀疏数据的临床试验结果" class="section level3 unnumbered">
<h3>稀疏数据的临床试验结果</h3>
<div class="sourceCode" id="cb304"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb304-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb304-2" data-line-number="2"><span class="kw">library</span>(dplyr)</a>
<a class="sourceLine" id="cb304-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;treatment3&quot;</span>)</a>
<a class="sourceLine" id="cb304-4" data-line-number="4">treatment3_df1 &lt;-<span class="st"> </span><span class="kw">as.data.frame</span>(treatment3)</a>
<a class="sourceLine" id="cb304-5" data-line-number="5">treatment3_df1<span class="op">$</span>Center &lt;-<span class="st"> </span><span class="kw">factor</span>(treatment3_df1<span class="op">$</span>Center, <span class="dv">5</span><span class="op">:</span><span class="dv">1</span>)</a>
<a class="sourceLine" id="cb304-6" data-line-number="6">treatment3_df2 &lt;-<span class="st"> </span><span class="kw">spread</span>(treatment3_df1, Response, Freq)</a>
<a class="sourceLine" id="cb304-7" data-line-number="7"></a>
<a class="sourceLine" id="cb304-8" data-line-number="8"></a>
<a class="sourceLine" id="cb304-9" data-line-number="9"><span class="co"># 使用数据框进行回归</span></a>
<a class="sourceLine" id="cb304-10" data-line-number="10">m1_df1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb304-11" data-line-number="11">  (Response <span class="op">==</span><span class="st"> &quot;Success&quot;</span>) <span class="op">~</span><span class="st"> </span>Center <span class="op">+</span><span class="st"> </span>Treatment, </a>
<a class="sourceLine" id="cb304-12" data-line-number="12">  <span class="dt">family =</span> <span class="kw">binomial</span>(), <span class="dt">weights =</span> Freq,</a>
<a class="sourceLine" id="cb304-13" data-line-number="13">  <span class="dt">data =</span> treatment3_df1</a>
<a class="sourceLine" id="cb304-14" data-line-number="14">)</a>
<a class="sourceLine" id="cb304-15" data-line-number="15"><span class="kw">summary</span>(m1_df1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = (Response == &quot;Success&quot;) ~ Center + Treatment, family = binomial(), 
##     data = treatment3_df1, weights = Freq)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9488  -0.7277  -0.0001   0.5665   3.0974  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(&gt;|z|)  
## (Intercept)        -0.476      0.506   -0.94    0.346  
## Center4             1.063      0.701    1.52    0.129  
## Center3           -18.614   2985.252   -0.01    0.995  
## Center2            -2.180      1.133   -1.92    0.054 .
## Center1           -18.587   3180.370   -0.01    0.995  
## TreatmentPlacebo   -1.546      0.702   -2.20    0.028 *
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 85.77  on 14  degrees of freedom
## Residual deviance: 57.74  on  9  degrees of freedom
## AIC: 69.74
## 
## Number of Fisher Scoring iterations: 17</code></pre>
<div class="sourceCode" id="cb306"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb306-1" data-line-number="1"><span class="co"># 使用列联表进行回归</span></a>
<a class="sourceLine" id="cb306-2" data-line-number="2">m1_df2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb306-3" data-line-number="3">  <span class="kw">cbind</span>(Success, Failure) <span class="op">~</span><span class="st"> </span>Center <span class="op">+</span><span class="st"> </span>Treatment, </a>
<a class="sourceLine" id="cb306-4" data-line-number="4">  <span class="dt">family =</span> <span class="kw">binomial</span>(), </a>
<a class="sourceLine" id="cb306-5" data-line-number="5">  <span class="dt">data =</span> treatment3_df2</a>
<a class="sourceLine" id="cb306-6" data-line-number="6">)</a>
<a class="sourceLine" id="cb306-7" data-line-number="7"><span class="kw">summary</span>(m1_df2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = cbind(Success, Failure) ~ Center + Treatment, family = binomial(), 
##     data = treatment3_df2)
## 
## Deviance Residuals: 
##      1       2       3       4       5       6       7  
## -0.201   0.294   0.151  -0.173   0.000   0.000   0.161  
##      8       9      10  
## -0.545   0.000   0.000  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(&gt;|z|)  
## (Intercept)         -0.476      0.506   -0.94    0.346  
## Center4              1.063      0.701    1.52    0.129  
## Center3            -22.565  21523.645    0.00    0.999  
## Center2             -2.180      1.133   -1.92    0.054 .
## Center1            -22.570  23296.396    0.00    0.999  
## TreatmentPlacebo    -1.546      0.702   -2.20    0.028 *
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 28.53202  on 9  degrees of freedom
## Residual deviance:  0.50214  on 4  degrees of freedom
## AIC: 24.86
## 
## Number of Fisher Scoring iterations: 21</code></pre>
<p>两个模型治疗中心1和治疗中心3的系数绝对值和SE都很大，并且在两个模型中的系数是不同的。而其他变量则正常，并且在两个模型有相同的系数和SE。</p>
<p>而接下来去除截距项，重新拟合。</p>
<div class="sourceCode" id="cb308"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb308-1" data-line-number="1">m2_df1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb308-2" data-line-number="2">  (Response <span class="op">==</span><span class="st"> &quot;Success&quot;</span>) <span class="op">~</span><span class="st"> </span>Center <span class="op">+</span><span class="st"> </span>Treatment <span class="op">-</span><span class="st"> </span><span class="dv">1</span>, </a>
<a class="sourceLine" id="cb308-3" data-line-number="3">  <span class="dt">family =</span> <span class="kw">binomial</span>(), <span class="dt">weights =</span> Freq,</a>
<a class="sourceLine" id="cb308-4" data-line-number="4">  <span class="dt">data =</span> treatment3_df1</a>
<a class="sourceLine" id="cb308-5" data-line-number="5">)</a>
<a class="sourceLine" id="cb308-6" data-line-number="6"><span class="kw">summary</span>(m2_df1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = (Response == &quot;Success&quot;) ~ Center + Treatment - 
##     1, family = binomial(), data = treatment3_df1, weights = Freq)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9488  -0.7277  -0.0001   0.5665   3.0974  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(&gt;|z|)  
## Center5            -0.476      0.506   -0.94    0.346  
## Center4             0.587      0.605    0.97    0.332  
## Center3           -19.090   2985.252   -0.01    0.995  
## Center2            -2.657      1.036   -2.56    0.010 *
## Center1           -19.064   3180.370   -0.01    0.995  
## TreatmentPlacebo   -1.546      0.702   -2.20    0.028 *
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.31  on 15  degrees of freedom
## Residual deviance:  57.74  on  9  degrees of freedom
## AIC: 69.74
## 
## Number of Fisher Scoring iterations: 17</code></pre>
<div class="sourceCode" id="cb310"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb310-1" data-line-number="1">m2_df2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb310-2" data-line-number="2">  <span class="kw">cbind</span>(Success, Failure) <span class="op">~</span><span class="st"> </span>Center <span class="op">+</span><span class="st"> </span>Treatment <span class="op">-</span><span class="st"> </span><span class="dv">1</span>, </a>
<a class="sourceLine" id="cb310-3" data-line-number="3">  <span class="dt">family =</span> <span class="kw">binomial</span>(), </a>
<a class="sourceLine" id="cb310-4" data-line-number="4">  <span class="dt">data =</span> treatment3_df2</a>
<a class="sourceLine" id="cb310-5" data-line-number="5">)</a>
<a class="sourceLine" id="cb310-6" data-line-number="6"><span class="kw">summary</span>(m2_df2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = cbind(Success, Failure) ~ Center + Treatment - 
##     1, family = binomial(), data = treatment3_df2)
## 
## Deviance Residuals: 
##      1       2       3       4       5       6       7  
## -0.201   0.294   0.151  -0.173   0.000   0.000   0.161  
##      8       9      10  
## -0.545   0.000   0.000  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(&gt;|z|)  
## Center5             -0.476      0.506   -0.94    0.346  
## Center4              0.587      0.605    0.97    0.332  
## Center3            -23.041  21523.645    0.00    0.999  
## Center2             -2.657      1.036   -2.56    0.010 *
## Center1            -23.046  23296.396    0.00    0.999  
## TreatmentPlacebo    -1.546      0.702   -2.20    0.028 *
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 73.07369  on 10  degrees of freedom
## Residual deviance:  0.50214  on  4  degrees of freedom
## AIC: 24.86
## 
## Number of Fisher Scoring iterations: 21</code></pre>
<p>结果与之前类似。</p>
<p>接着尝试不考虑治疗中心的效应</p>
<div class="sourceCode" id="cb312"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb312-1" data-line-number="1">treatment3_margin &lt;-<span class="st"> </span><span class="kw">margin.table</span>(treatment3, <span class="kw">c</span>(<span class="dv">2</span>, <span class="dv">3</span>))</a>
<a class="sourceLine" id="cb312-2" data-line-number="2">treatment &lt;-<span class="st"> </span><span class="kw">rownames</span>(treatment3_margin)</a>
<a class="sourceLine" id="cb312-3" data-line-number="3">treatment3_margin</a></code></pre></div>
<pre><code>##              Response
## Treatment     Success Failure
##   Active drug      12      36
##   Placebo           4      42</code></pre>
<div class="sourceCode" id="cb314"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb314-1" data-line-number="1">m3 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb314-2" data-line-number="2">  treatment3_margin <span class="op">~</span><span class="st"> </span>treatment, </a>
<a class="sourceLine" id="cb314-3" data-line-number="3">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb314-4" data-line-number="4">)</a>
<a class="sourceLine" id="cb314-5" data-line-number="5"></a>
<a class="sourceLine" id="cb314-6" data-line-number="6"><span class="kw">summary</span>(m3)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = treatment3_margin ~ treatment, family = binomial())
## 
## Deviance Residuals: 
## [1]  0  0
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)        -1.099      0.333   -3.30  0.00098 ***
## treatmentPlacebo   -1.253      0.620   -2.02  0.04346 *  
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance:  4.6054e+00  on 1  degrees of freedom
## Residual deviance: -5.3291e-15  on 0  degrees of freedom
## AIC: 11.23
## 
## Number of Fisher Scoring iterations: 3</code></pre>
<p>此时模型系数就正常了</p>
</div>
</div>
<div id="conditional-logistic" class="section level2">
<h2><span class="header-section-number">5.4</span> 条件logistic回归与精确推断</h2>
<div id="晋升能力" class="section level3 unnumbered">
<h3>晋升能力</h3>
<div class="sourceCode" id="cb316"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb316-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb316-2" data-line-number="2"><span class="kw">library</span>(tidyr)</a>
<a class="sourceLine" id="cb316-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;promotion_race&quot;</span>)</a>
<a class="sourceLine" id="cb316-4" data-line-number="4">promotion_race_df &lt;-<span class="st"> </span><span class="kw">spread</span>(<span class="kw">as.data.frame</span>(promotion_race), Promotion, Freq)</a>
<a class="sourceLine" id="cb316-5" data-line-number="5"></a>
<a class="sourceLine" id="cb316-6" data-line-number="6">m &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb316-7" data-line-number="7">  <span class="kw">cbind</span>(Yes, No) <span class="op">~</span><span class="st"> </span>Race <span class="op">+</span><span class="st"> </span>Month, </a>
<a class="sourceLine" id="cb316-8" data-line-number="8">  <span class="dt">data =</span> promotion_race_df,</a>
<a class="sourceLine" id="cb316-9" data-line-number="9">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb316-10" data-line-number="10">)</a>
<a class="sourceLine" id="cb316-11" data-line-number="11"></a>
<a class="sourceLine" id="cb316-12" data-line-number="12"><span class="kw">summary</span>(m)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Month, family = binomial(), 
##     data = promotion_race_df)
## 
## Deviance Residuals: 
##         1          2          3          4          5  
## -9.52e-06  -1.06e-05  -7.98e-06  -4.20e-08   0.00e+00  
##         6  
##  0.00e+00  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(&gt;|z|)
## (Intercept)      -25.764  52607.802    0.00     1.00
## RaceWhite         24.377  52607.802    0.00     1.00
## MonthAugust        0.208      0.800    0.26     0.80
## MonthSeptember    -0.486      0.943   -0.51     0.61
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8.2664e+00  on 5  degrees of freedom
## Residual deviance: 2.6585e-10  on 2  degrees of freedom
## AIC: 16.52
## 
## Number of Fisher Scoring iterations: 23</code></pre>
<p>模型中种族效应的估计值是一个非常极端的结果(-24.38)</p>
</div>
</div>
<div id="logistic-sample-num" class="section level2">
<h2><span class="header-section-number">5.5</span> logistic回归的样本量与功效</h2>
<div id="样本量计算" class="section level3 unnumbered">
<h3>样本量计算</h3>
<p>计算比较两个比例所需要样本量可以使用<code>cdabookcode</code>中的<code>samplesize_prop()</code>计算。</p>
<div class="sourceCode" id="cb318"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb318-1" data-line-number="1"><span class="kw">library</span>(cdabookfunc)</a>
<a class="sourceLine" id="cb318-2" data-line-number="2"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb318-3" data-line-number="3"><span class="kw">samplesize_prop</span>(<span class="fl">0.2</span>, <span class="fl">0.3</span>, <span class="fl">0.05</span>, <span class="fl">0.1</span>)</a></code></pre></div>
<pre><code>## [1] 389</code></pre>
</div>
</div>
<div id="problem-ch5" class="section level2 unnumbered">
<h2>课后题</h2>
<div id="第10题" class="section level3 unnumbered">
<h3>第10题</h3>
<ol style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb320"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb320-1" data-line-number="1"><span class="kw">library</span>(dplyr)</a>
<a class="sourceLine" id="cb320-2" data-line-number="2"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb320-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>)</a>
<a class="sourceLine" id="cb320-4" data-line-number="4">horseshoecrabs<span class="op">$</span>psat &lt;-<span class="st"> </span>horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span></a>
<a class="sourceLine" id="cb320-5" data-line-number="5">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb320-6" data-line-number="6">  psat <span class="op">~</span><span class="st"> </span>Weight, </a>
<a class="sourceLine" id="cb320-7" data-line-number="7">  <span class="dt">data =</span> horseshoecrabs, </a>
<a class="sourceLine" id="cb320-8" data-line-number="8">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb320-9" data-line-number="9">)</a>
<a class="sourceLine" id="cb320-10" data-line-number="10"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Weight, family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.111  -1.075   0.543   0.912   1.629  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)   -3.695      0.880   -4.20  2.7e-05 ***
## Weight         1.815      0.377    4.82  1.4e-06 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.7
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<div class="sourceCode" id="cb322"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb322-1" data-line-number="1"><span class="co"># 预测类别</span></a>
<a class="sourceLine" id="cb322-2" data-line-number="2">pred_type &lt;-<span class="st"> </span><span class="kw">fitted</span>(m1) <span class="op">&gt;</span><span class="st"> </span><span class="kw">mean</span>(horseshoecrabs<span class="op">$</span>psat)</a>
<a class="sourceLine" id="cb322-3" data-line-number="3"><span class="co"># 真实类别</span></a>
<a class="sourceLine" id="cb322-4" data-line-number="4">true_type &lt;-<span class="st"> </span>horseshoecrabs<span class="op">$</span>psat</a>
<a class="sourceLine" id="cb322-5" data-line-number="5"><span class="co"># 混淆矩阵</span></a>
<a class="sourceLine" id="cb322-6" data-line-number="6"><span class="kw">table</span>(true_type, pred_type)</a></code></pre></div>
<pre><code>##          pred_type
## true_type FALSE TRUE
##     FALSE    45   17
##     TRUE     43   68</code></pre>
<div class="sourceCode" id="cb324"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb324-1" data-line-number="1"><span class="co"># 敏感度与特异度</span></a>
<a class="sourceLine" id="cb324-2" data-line-number="2"><span class="kw">table</span>(true_type, pred_type) <span class="op">%&gt;%</span></a>
<a class="sourceLine" id="cb324-3" data-line-number="3"><span class="st">  </span><span class="kw">prop.table</span>(<span class="dt">margin =</span> <span class="dv">1</span>) <span class="op">%&gt;%</span></a>
<a class="sourceLine" id="cb324-4" data-line-number="4"><span class="st">  </span><span class="kw">round</span>(<span class="dv">4</span>)</a></code></pre></div>
<pre><code>##          pred_type
## true_type  FALSE   TRUE
##     FALSE 0.7258 0.2742
##     TRUE  0.3874 0.6126</code></pre>
<p>则模型的敏感度为0.6126，特异度为0.7258。</p>
<p>对于有追随者的母鲎，模型有0.6126的概率预测其有追随者；对于没有追随者的母鲎，模型有0.7258的概率预测其没有追随者；</p>
<ol start="2" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb326"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb326-1" data-line-number="1"><span class="kw">library</span>(pROC)</a>
<a class="sourceLine" id="cb326-2" data-line-number="2"><span class="kw">par</span>(<span class="dt">pty =</span> <span class="st">&quot;s&quot;</span>)</a>
<a class="sourceLine" id="cb326-3" data-line-number="3">result &lt;-<span class="st"> </span><span class="kw">roc</span>(</a>
<a class="sourceLine" id="cb326-4" data-line-number="4">  true_type,</a>
<a class="sourceLine" id="cb326-5" data-line-number="5">  <span class="kw">fitted</span>(m1),</a>
<a class="sourceLine" id="cb326-6" data-line-number="6">  <span class="dt">plot =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb326-7" data-line-number="7">  <span class="dt">auc.polygon =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb326-8" data-line-number="8">  <span class="dt">grid =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb326-9" data-line-number="9">  <span class="dt">asp =</span><span class="dv">1</span>,</a>
<a class="sourceLine" id="cb326-10" data-line-number="10">  <span class="dt">xaxs=</span><span class="st">&quot;i&quot;</span>,</a>
<a class="sourceLine" id="cb326-11" data-line-number="11">  <span class="dt">yaxs=</span><span class="st">&quot;i&quot;</span></a>
<a class="sourceLine" id="cb326-12" data-line-number="12">)</a>
<a class="sourceLine" id="cb326-13" data-line-number="13"><span class="kw">text</span>(<span class="fl">0.3</span>, <span class="fl">0.3</span>, <span class="dt">labels =</span> <span class="kw">paste</span>(<span class="st">&quot;AUC =&quot;</span>, <span class="kw">round</span>(result<span class="op">$</span>auc, <span class="dv">4</span>)), <span class="dt">cex =</span> <span class="fl">1.3</span>)</a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-125-1.png" width="70%" style="display: block; margin: auto;" /></p>
<p>AUC值为0.7379</p>
<ol start="3" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb327"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb327-1" data-line-number="1"><span class="kw">library</span>(ResourceSelection)</a>
<a class="sourceLine" id="cb327-2" data-line-number="2"><span class="kw">hoslem.test</span>(m1<span class="op">$</span>y, <span class="kw">fitted</span>(m1), <span class="dt">g =</span> <span class="dv">10</span>)</a></code></pre></div>
<pre><code>## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  m1$y, fitted(m1)
## X-squared = 7.8, df = 8, p-value = 0.4</code></pre>
<p>p值为0.4499，大于0.05。因此我们认为模型是充分的。</p>
<ol start="4" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb329"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb329-1" data-line-number="1">m2 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb329-2" data-line-number="2">  psat <span class="op">~</span><span class="st"> </span>Weight <span class="op">+</span><span class="st"> </span><span class="kw">I</span>(Weight <span class="op">^</span><span class="st"> </span><span class="dv">2</span>),</a>
<a class="sourceLine" id="cb329-3" data-line-number="3">  <span class="dt">data =</span> horseshoecrabs,</a>
<a class="sourceLine" id="cb329-4" data-line-number="4">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb329-5" data-line-number="5">)</a>
<a class="sourceLine" id="cb329-6" data-line-number="6"><span class="kw">summary</span>(m2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Weight + I(Weight^2), family = binomial(), 
##     data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.183  -1.074   0.520   0.939   1.543  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)
## (Intercept)   -1.888      3.549   -0.53     0.59
## Weight         0.218      3.082    0.07     0.94
## I(Weight^2)    0.339      0.654    0.52     0.60
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.46  on 170  degrees of freedom
## AIC: 201.5
## 
## Number of Fisher Scoring iterations: 5</code></pre>
<ol start="5" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb331"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb331-1" data-line-number="1"><span class="kw">c</span>(<span class="dt">m1 =</span> <span class="kw">AIC</span>(m1), <span class="dt">m2 =</span> <span class="kw">AIC</span>(m2))</a></code></pre></div>
<pre><code>##    m1    m2 
## 199.7 201.5</code></pre>
<p>模型1有更小的AIC值，因此我们认为模型1更好，即不需要平方项。</p>
</div>
<div id="第18题-1" class="section level3 unnumbered">
<h3>第18题</h3>
<ol style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb333"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb333-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb333-2" data-line-number="2"><span class="kw">library</span>(tidyr)</a>
<a class="sourceLine" id="cb333-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;smoking_lungcancer_cn&quot;</span>)</a>
<a class="sourceLine" id="cb333-4" data-line-number="4"><span class="kw">ftable</span>(smoking_lungcancer_cn)</a></code></pre></div>
<pre><code>##                      Smoking Yes  No
## City      LungCancer                
## Beijing   Yes                126  35
##           No                 100  61
## Shanghai  Yes                908 497
##           No                 688 807
## Shenyang  Yes                913 336
##           No                 747 598
## Nanjing   Yes                235  58
##           No                 172 121
## Harbin    Yes                402 121
##           No                 308 215
## Zhengzhou Yes                182  72
##           No                 156  98
## Taiyuan   Yes                 60  11
##           No                  99  43
## Nanchang  Yes                104  21
##           No                  89  36</code></pre>
<div class="sourceCode" id="cb335"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb335-1" data-line-number="1">smoking_lungcancer_cn &lt;-<span class="st"> </span><span class="kw">spread</span>(</a>
<a class="sourceLine" id="cb335-2" data-line-number="2">  <span class="kw">as.data.frame</span>(smoking_lungcancer_cn), LungCancer, Freq</a>
<a class="sourceLine" id="cb335-3" data-line-number="3">)</a>
<a class="sourceLine" id="cb335-4" data-line-number="4"></a>
<a class="sourceLine" id="cb335-5" data-line-number="5">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb335-6" data-line-number="6">  <span class="kw">cbind</span>(Yes, No) <span class="op">~</span><span class="st"> </span>City <span class="op">+</span><span class="st"> </span>Smoking, </a>
<a class="sourceLine" id="cb335-7" data-line-number="7">  <span class="dt">data =</span> smoking_lungcancer_cn,</a>
<a class="sourceLine" id="cb335-8" data-line-number="8">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb335-9" data-line-number="9">)</a>
<a class="sourceLine" id="cb335-10" data-line-number="10"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = cbind(Yes, No) ~ City + Smoking, family = binomial(), 
##     data = smoking_lungcancer_cn)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2178  -0.1484  -0.0001   0.1682   1.3547  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)    0.22838    0.11398    2.00    0.045 *  
## CityShanghai   0.05562    0.11957    0.47    0.642    
## CityShenyang  -0.02774    0.12007   -0.23    0.817    
## CityNanjing    0.00576    0.14091    0.04    0.967    
## CityHarbin     0.01819    0.12947    0.14    0.888    
## CityZhengzhou  0.02878    0.14476    0.20    0.842    
## CityTaiyuan   -0.74568    0.18552   -4.02  5.8e-05 ***
## CityNanchang  -0.05491    0.17100   -0.32    0.748    
## SmokingNo     -0.77706    0.04677  -16.61  &lt; 2e-16 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 310.8951  on 15  degrees of freedom
## Residual deviance:   5.1958  on  7  degrees of freedom
## AIC: 121
## 
## Number of Fisher Scoring iterations: 3</code></pre>
<ol start="2" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb337"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb337-1" data-line-number="1"><span class="co"># X2检验</span></a>
<a class="sourceLine" id="cb337-2" data-line-number="2">df &lt;-<span class="st"> </span><span class="kw">nrow</span>(smoking_lungcancer_cn) <span class="op">-</span><span class="st"> </span><span class="kw">length</span>(<span class="kw">coef</span>(m1))</a>
<a class="sourceLine" id="cb337-3" data-line-number="3">X2 &lt;-<span class="st"> </span><span class="kw">sum</span>(<span class="kw">resid</span>(m1, <span class="dt">type =</span> <span class="st">&quot;pearson&quot;</span>) <span class="op">^</span><span class="st"> </span><span class="dv">2</span>)</a>
<a class="sourceLine" id="cb337-4" data-line-number="4">x2_pvalue &lt;-<span class="st"> </span><span class="dv">1</span><span class="op">-</span><span class="st"> </span><span class="kw">pchisq</span>(X2, df)</a>
<a class="sourceLine" id="cb337-5" data-line-number="5"><span class="kw">c</span>(<span class="dt">X2 =</span> X2, <span class="dt">pvalue =</span> x2_pvalue)</a></code></pre></div>
<pre><code>##     X2 pvalue 
## 5.1999 0.6356</code></pre>
<p>X2检验统计量为5.2，p值为0.6356，大于0.05。因此我们认为模型是充分的。</p>
<ol start="3" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb339"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb339-1" data-line-number="1"><span class="kw">rstandard</span>(m1)</a></code></pre></div>
<pre><code>##         1         2         3         4         5         6 
##  0.038865 -0.038875 -0.247104  0.247059  0.001264 -0.001264 
##         7         8         9        10        11        12 
##  1.486229 -1.497384  0.500428 -0.501198 -1.708291  1.697803 
##        13        14        15        16 
##  0.229398 -0.231070 -0.268310  0.267550</code></pre>
<div class="sourceCode" id="cb341"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb341-1" data-line-number="1"><span class="kw">range</span>(<span class="kw">rstandard</span>(m1))</a></code></pre></div>
<pre><code>## [1] -1.708  1.698</code></pre>
<p>标准化残差在-1.7和1.7之间，这残差范围是正常合理的。</p>
</div>
<div id="第28题" class="section level3 unnumbered">
<h3>第28题</h3>
<div class="sourceCode" id="cb343"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb343-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb343-2" data-line-number="2"><span class="co"># (a)</span></a>
<a class="sourceLine" id="cb343-3" data-line-number="3"><span class="kw">samplesize_prop</span>(<span class="fl">0.2</span>, <span class="fl">0.3</span>, <span class="fl">0.1</span>, <span class="fl">0.2</span>)</a></code></pre></div>
<pre><code>## [1] 229</code></pre>
<div class="sourceCode" id="cb345"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb345-1" data-line-number="1"><span class="co"># (b)(i)</span></a>
<a class="sourceLine" id="cb345-2" data-line-number="2"><span class="kw">samplesize_prop</span>(<span class="fl">0.2</span>, <span class="fl">0.3</span>, <span class="fl">0.1</span>, <span class="fl">0.1</span>)</a></code></pre></div>
<pre><code>## [1] 317</code></pre>
<div class="sourceCode" id="cb347"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb347-1" data-line-number="1"><span class="co"># (b)(ii)</span></a>
<a class="sourceLine" id="cb347-2" data-line-number="2"><span class="kw">samplesize_prop</span>(<span class="fl">0.2</span>, <span class="fl">0.3</span>, <span class="fl">0.05</span>, <span class="fl">0.2</span>)</a></code></pre></div>
<pre><code>## [1] 291</code></pre>
<div class="sourceCode" id="cb349"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb349-1" data-line-number="1"><span class="co"># (b)(iii)</span></a>
<a class="sourceLine" id="cb349-2" data-line-number="2"><span class="kw">samplesize_prop</span>(<span class="fl">0.2</span>, <span class="fl">0.3</span>, <span class="fl">0.05</span>, <span class="fl">0.1</span>)</a></code></pre></div>
<pre><code>## [1] 389</code></pre>

</div>
</div>
</div>
            </section>

          </div>
        </div>
      </div>
<a href="logistic-regression.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="multi-logit-model.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
    </div>
  </div>
<script src="libs/gitbook/js/app.min.js"></script>
<script src="libs/gitbook/js/lunr.js"></script>
<script src="libs/gitbook/js/plugin-search.js"></script>
<script src="libs/gitbook/js/plugin-sharing.js"></script>
<script src="libs/gitbook/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook/js/plugin-bookdown.js"></script>
<script src="libs/gitbook/js/jquery.highlight.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": true,
"facebook": false,
"twitter": false,
"google": false,
"linkedin": false,
"weibo": false,
"instapper": false,
"vk": false,
"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": "https://github.com/jinzhen-lin/cdacode-document/edit/master/05-build-and-apply-logistic-model.Rmd",
"text": "编辑"
},
"download": ["cdacode.pdf", "cdacode.epub", "cdacode.zip"],
"toc": {
"collapse": "section"
}
});
});
</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    var src = "true";
    if (src === "" || src === "true") src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
    if (location.protocol !== "file:" && /^https?:/.test(src))
      src = src.replace(/^https?:/, '');
    script.src = src;
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
</body>

</html>
